Adapting a compressed model for use in speech recognition

ABSTRACT

A speech recognition system includes a receiver component that receives a distorted speech utterance. The speech recognition also includes an adaptor component that selectively adapts parameters of a compressed model used to recognize at least a portion of the distorted speech utterance, wherein the adaptor component selectively adapts the parameters of the compressed model based at least in part upon the received distorted speech utterance.

BACKGROUND

Speech recognition has been the subject of a significant amount ofresearch and commercial development. For example, speech recognitionsystems have been incorporated into mobile telephones, desktopcomputers, automobiles, and the like in order to provide a particularresponse to speech input provided by a user. For instance, in a mobiletelephone equipped with speech recognition technology, a user can speaka name of a contact listed in the mobile telephone and the mobiletelephone can initiate a call to the contact. Furthermore, manycompanies are currently using speech recognition technology to aidcustomers in connection with identifying employees of a company,identifying problems with a product or service, etc.

As noted above, speech recognition systems have been incorporated intomobile devices, such as mobile telephones. Robust speech recognitionsystems, however, typically employ models that require a relativelylarge amount of storage and/or processing resources—resources that areconstrained due to the size of mobile devices. As a result, speechrecognition systems in mobile devices have been associated with reducedfunctionality (in comparison to robust speech recognition systems)and/or relatively poor performance, particularly in noisy environments.

SUMMARY

The following is a brief summary of subject matter that is described ingreater detail herein. This summary is not intended to be limiting as tothe scope of the claims.

Technologies pertaining to speech recognition in general, andtechnologies pertaining to adapting parameters of compressed models usedin speech recognition systems in mobile devices in particular, aredescribed herein. In an example, a speech recognition system can includea plurality of compressed models, wherein at least one of the compressedmodels can be a Hidden Markov Model.

In an example, a speech utterance can be received at a mobile device (orother device with memory and/or processing constraints) through use ofone or more microphones. The speech utterance can have variousdistortions, including additive distortions and convolutive distortions.At least one of the plurality of compressed models in the speechrecognition system can be adapted based at least in part upon thedistorted speech utterance.

In an example, coarse estimates of various parameters pertaining touttered speech can be ascertained through analysis of a received,distorted speech utterance. For instance, an additive distortion meanvector can be estimated by analyzing samples of a first plurality offrames (speech free) of the received distorted speech utterance. Suchadditive distortion mean vector, for example, can be used to initiallyadapt at least a first subset of compressed models in the plurality ofcompressed models or parameters thereof. The received distorted speechutterance can then be decoded through use of the plurality of compressedmodels, and thereafter at least one of the various parameters estimatedabove can be more accurately estimated (re-estimated) based at least inpart upon results of decoding the speech utterance. A second subset ofcompressed models in the plurality of compressed models can then beadapted based at least in part upon the at least one re-estimatedparameter. In another example, each of the plurality of compressedmodels can be adapted based at least in part upon the at least onere-estimated parameter.

Other aspects will be appreciated upon reading and understanding theattached figures and description.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a functional block diagram of an example system thatfacilitates adapting parameters of a compressed model in a speechrecognition system.

FIG. 2 is a functional block diagram of an example system thatfacilitates adapting parameters of a compressed model in a speechrecognition system.

FIG. 3 is a functional block diagram of an example system thatfacilitates compressing a model that is configured for use in a speechrecognition system.

FIG. 4 is a functional block diagram of an example system thatfacilitates adapting parameters of a compressed model in a speechrecognition system.

FIG. 5 is a flow diagram that illustrates an example methodology foradapting parameters of a compressed model in a speech recognitionsystem.

FIGS. 6 and 7 depict a flow diagram that illustrates an examplemethodology for adapting compressed models in a speech recognitionsystem.

FIGS. 8 and 9 depict a flow diagram that illustrates an examplemethodology for selectively adapting compressed models in a speechrecognition system.

FIGS. 10 and 11 depict a flow diagram that illustrates an examplemethodology for selectively adapting compressed models in a speechrecognition system.

FIG. 12 is a flow diagram that illustrates an example methodology foradapting parameters of a compressed model in a speech recognitionsystem.

FIG. 13 is an example computing system.

DETAILED DESCRIPTION

Various technologies pertaining to speech recognition will now bedescribed with reference to the drawings, where like reference numeralsrepresent like elements throughout. In addition, several functionalblock diagrams of example systems are illustrated and described hereinfor purposes of explanation; however, it is to be understood thatfunctionality that is described as being carried out by certain systemcomponents may be performed by multiple components. Similarly, forinstance, a component may be configured to perform functionality that isdescribed as being carried out by multiple components.

With reference to FIG. 1, an example speech recognition system 100 thatfacilitates adapting parameters of a compressed model for utilization inconnection with speech recognition is illustrated. In an example, thesystem 100 can be well-suited for use in a mobile device, such as amobile telephone, a personal digital assistant, a smart phone, or othersuitable mobile device. It is to be understood, however, that the system100 can also be well-suited for use in conventional computing devices,such as desktop computers, computers in automobiles, or other deviceswhere hands-free communication may be desirable.

The system 100 includes a receiver component 102 that receives adistorted speech utterance from an individual. Thus, the receivercomponent 102 can be in communication with a microphone or series ofmicrophones (not shown). The distorted speech utterance can includeadditive and convolutive distortions. In an example, an additivedistortion can be or include background noise, such as a fan running inthe background, road noise from an automobile, etc. A convolutivedistortion can be channel noise, such as a change in position of amicrophone, an alteration from a hand-held to a hands-free microphone,or other suitable channel noise. As noted above, the distorted speechutterance can include both additive and convolutive distortions.

The system 100 further includes an adaptor component 104 that can adaptparameters of a compressed model 106 that is used in connection withrecognizing at least a portion of the distorted speech utterance. Forinstance, the adaptor component 104 can adapt parameters of thecompressed model 106 based at least in part upon the distorted speechutterance received by the receiver component 102. Pursuant to anexample, the adaptor component 104 can be configured to jointlycompensate for additive and convolutive distortions in the receivedspeech utterance when adapting the parameters of the compressed model106.

In an example, and as will be described in greater detail below, thecompressed model 106 can be a compressed Hidden Markov Model (HMM). TheHMM can be trained using clean speech and/or speech that includes one ormore distortions. Further, the Hidden Markov Model can be compressedusing Subspace Coding (SSC). It is to be understood, however, that othercompression techniques are also contemplated and are intended to fallunder the scope of the hereto-appended claims. In yet another example,the adaptor component 104 can jointly take into consideration additiveand convolutive distortion in connection with adapting parameters of thecompressed model 106.

Furthermore, the adapter component 104 can utilize various techniquesfor adapting the compressed model 106 (and other compressed models) inconnection with conserving storage (e.g., memory) and/or computingresources. For instance, as will be described herein, the adaptorcomponent 104 can adapt parameters of the compressed model 106 only ifthe compressed model 106 is used in connection with recognizing at leasta portion of the distorted speech utterance received by the receivercomponent 102.

Referring now to FIG. 2, an example system 200 that facilitates adaptingparameters of a model used to recognize speech utterances isillustrated. The system 200 includes the receiver component 102 and theadaptor component 104, which act as described above. The system 200 alsoincludes a model 202 that can be used to model multiple distortions inthe distorted speech utterance received by the receiver component 102.For instance, the model 202 can be a nonlinear model that is configuredto model additive distortions and convolutive distortions in thereceived distorted speech utterance (without considering phase ofdistortions). In another example, the model 202 can be a nonlinearphase-sensitive model for modeling additive and convolutive distortionsthat takes into consideration phase of distortions.

For example, the distorted speech utterance can be referred to as y[m],and can be generated from a clean speech signal x[m] with noise (e.g.,additive distortion) n[m] and the channel's impulse response (e.g.,convolutive distortion) according to the following:

y[m]=x[m]*h[m]+n[m]  (1)

y, x, n, and h are the vector-valued distorted speech, clean speech,additive distortion, and convolutive distortion, respectively, all inthe Mel-frequency cepstral coefficients (MFCC) domain.

Such a model can be used to ascertain an additive distortion mean vectorμ_(n) and a convolutive mean distortion vector μ_(h) (e.g., in theMel-frequency cepstral coefficients (MFCC) domain).

In another example, the model 202 can be a nonlinear phase-sensitivemodel of multiple distortions in uttered speech. As noted above,Equation (1) can be used to represent a distorted speech utterance. Withdiscrete Fourier transformation (DFT), the following equivalentrelations can be established in the frequency domain:

Y[k]=X[k]H[k]+N[k],   (2)

where k is a frequency-bin index in DFT given a fixed-length timewindow. The power spectrum of the distorted speech can then be thefollowing:

|Y[k]| ² =|X[k]| ² |H[k]| ² +|N[k]| ²+2|X[k]∥H[k]∥N[k]|cos θ _(k),   (3)

where θ_(k) denotes an (random) angle between the two complex variablesN[k] and (X[k]H[k]).

By applying a set of Mel-scale filters (L in total) to the powerspectrum in Equation (3), the l-th Mel filter-bank energies can beobtained for distorted speech, clean speech, additive distortion, andconvolutive distortion, respectively, as follows:

$\begin{matrix}{{{\overset{\sim}{Y}}^{(l)}}^{2} = {\sum\limits_{k}{W_{k}^{(l)}{{Y\lbrack k\rbrack}}^{2}}}} & (4) \\{{{\overset{\sim}{X}}^{(l)}}^{2} = {\sum\limits_{k}{W_{k}^{(l)}{{X\lbrack k\rbrack}}^{2}}}} & (5) \\{{{\overset{\sim}{N}}^{(l)}}^{2} = {\sum\limits_{k}{W_{k}^{(l)}{{N\lbrack k\rbrack}}^{2}}}} & (6) \\{{{\overset{\sim}{H}}^{(l)}}^{2} = \frac{\sum\limits_{k}{W_{k}^{(l)}{{X\lbrack k\rbrack}}^{2}{{H\lbrack k\rbrack}}^{2}}}{{{\overset{\sim}{X}}^{(l)}}^{2}}} & (7)\end{matrix}$

where the l-th filter can be characterized by the transfer functionW_(k) ^((l))≧0(Σ_(k) W_(k) ^((l))=1).

The phase factor α^((l)) of the l-th Mel filter bank can be:

$\begin{matrix}{\alpha^{(l)} = \frac{\sum\limits_{k}{W_{k}^{(l)}{{X\lbrack k\rbrack}}{{H\lbrack k\rbrack}}{{N\lbrack k\rbrack}}\cos \; \theta_{k}}}{{{\overset{\sim}{X}}^{(l)}}{{\overset{\sim}{H}}^{(l)}}{{\overset{\sim}{N}}^{(l)}}}} & (8)\end{matrix}$

Given the above, the following relation can be obtained in the Melfilter-bank domain for the l-th Mel filter bank output:

|{tilde over (Y)} ^((l))|² =|{tilde over (X)} ^((l))|² |{tilde over (H)}^((l))|^(i +|Ñ) ^((l))|²+2{tilde over (α)}^((l)) {tilde over (X)} ^((l))∥{tilde over (H)} ^((l)) ∥Ñ ^((l))|  (9)

Further, a phase-factor vector for all the L Mel filter-banks can bedefined as follows:

α=[α⁽¹⁾, ═⁽²⁾, . . . ═^((l)), . . . α^((L))]^(T)   (10

A logarithm can be taken of both sides of Equation (9) and non-squarediscrete cosine transform (DCT) matrix C can be multiplied to both sidesof Equation (9) for all the L Mel filter banks, the following nonlinear,phase-sensitive distortion model can be obtained in the cepstral domain:

y=x+h+C log(1+exp(C ⁻¹(n−x−h))+2α·exp(C ⁻(n−x−h)/2))=x+h+g _(α)(x,h,n),  (11)

where g^(α)(x,h,n)=

C log(1+exp(C ⁻¹(n−x−h)+2α·exp(C ⁻¹(n−x−h)/2))   (12)

and C⁻¹ is the (pseudo) inverse DCT matrix, and y, x, n, and h are thevector-valued distorted speech, clean speech, additive distortion, andconvolutive distortion, respectively, all in the Mel-frequency cepstralcoefficients (MFCC) domain. The ·operation for two vectors can denote anelement-wise product, and each exponentiation of a vector above may alsobe an element-wise operation. Again, such a model can be used toascertain an additive distortion mean vector μ_(n) and a convolutivemean distortion vector μ_(h) (e.g., in the Mel-frequency cepstralcoefficients (MFCC) domain).

The adaptor component 104 can adapt parameters of the compressed model106 based at least in part upon the output of the model 202. Forinstance, the adaptor component 104 can adapt parameters of thecompressed model 202 based at least in part upon the additive distortionmean vector μ_(n) and the convolutive mean distortion vector μ_(h).

Pursuant to an example, the adaptor component 104 can linearize outputof the model 202, wherein the model is nonlinear but does notcontemplate phase. For instance, the adaptor component 104 can use avector Taylor series (VTS) algorithm to output a first orderapproximation of x, n, and h. In an example, for a given additivedistortion mean vector μ_(n) and convolutive mean distortion vectorμ_(h), a matrix G(.) that depends on μ_(x) for a k-th Gaussian in a j-thstate in the compressed model 106 can be defined as follows:

$\begin{matrix}{{{G\left( {j,k} \right)} = {C \cdot {{diag}\left( \frac{1}{1 + {\exp \left( {C^{- 1}\left( {\mu_{n} - \mu_{x,{jk}} - \mu_{h}} \right)} \right)}} \right)} \cdot C^{- 1}}},} & (13)\end{matrix}$

where C can be a non-square discrete cosine transform (DCT) matrix, C⁻¹can be a (pseudo) inverse DCT matrix, and diag(.) can refer to adiagonal covariance matrix with its diagonal component value being equalto the value of the vector in the argument. Each division of a vectorcan also be an element-wise operation.

The adaptor component 104 can then adapt Gaussian mean vectors in thecompressed model 106 for the k-th Gaussian in the j-th state forinstance, using the following algorithm:

μ_(y,jk)≈μ_(x,jk)+μ_(h) +C log(1+exp(C ⁻¹(μ_(n)−μ_(x,jk)−μ_(h))))   (14)

Further, the adaptor component 104 can adapt a covariance matrix in thecompressed model 106 using the following algorithm:

Σ_(y,jk) ≈G(j,k)Σ_(x,jk) G((j, k ^(T)+(I−G(j, k))Σ_(n)(I−G(j, k))^(T)  (15)

where Σ_(x,jk) is a covariance matrix for clean speech for the k-thGaussian in the j-th state of the compressed model 106, and where Σ_(n)is a covariance matrix for additive distortion.

For the delta and delta/delta portions of MFCC vectors, the adaptorcomponent 104 can use the following algorithms to adapt the mean vectorand covariance matrix:

μ_(Δy,jk) ≈G(j, k)μ_(Δx,jk)+(I−G(j, k)μ_(Δn)   (16)

μ_(ΔΔy,jk) ≈G(j, k)μ_(ΔΔx,jk)+(I−G(j, k)μ_(ΔΔn)   (17)

Σ_(Δy,jk) ≈G(j, k)Σ_(Δx,jk) G(j, k)^(T)+(I−G(j, k)Σ_(n)(I−G(j, k)^(T)  (18)

Σ_(ΔΔy,jk) ≈G(j, k)Σ_(ΔΔx,jk) G(j, k)^(T)+(I−G(j, k)Σ_(ΔΔn)(I−G(j,k))^(T)   (19)

As can be discerned, the example above (e.g., Equations (13)-(19))describes the adaptor component 104 adapting a covariance matrix anddelta and delta/delta portions of MFCC vectors without accounting forphase. It is to be understood, however, that the adaptor component 104can consider phase when adapting such parameters.

For instance, as noted above, the adaptor component 104 can use afirst-order VTS approximation with respect to x, n, and h. In anexample, the adaptor component 104 can use an assumption that aphase-factor vector α is independent of x, n, and h, and the followingcan be obtained:

$\begin{matrix}{{y \approx {\mu_{x} + \mu_{h} + {g\left( {\mu_{x},\mu_{h},\mu_{n}} \right)} + {G\left( {x - \mu_{x}} \right)} + {G\left( {h - \mu_{h}} \right)} + {\left( {I - G} \right)\left( {n - \mu_{n}} \right)}}},\mspace{79mu} {where}} & (20) \\{\mspace{79mu} {{\left. \frac{\partial y}{\partial x} \right|_{\mu_{x},\mu_{n},\mu_{h}} = {\left. \frac{\partial y}{\partial h} \right|_{\mu_{x},\mu_{n},\mu_{h}} = G}},}} & (21) \\{\mspace{79mu} {{\frac{\partial y}{\partial n} = {I - G}},}} & (22) \\{\mspace{79mu} {{G = {1 - {C\; {{diag}\left( \frac{\begin{matrix}{{\exp \left( {C^{- 1}\left( {\mu_{n} - \mu_{x} - \mu_{h}} \right)} \right)} +} \\{\alpha \cdot {\exp \left( {{C^{- 1}\left( {\mu_{n} - \mu_{x} - \mu_{h}} \right)}/2} \right)}}\end{matrix}}{\begin{matrix}{1 + {\exp \left( {C^{- 1}\left( {\mu_{n} - \mu_{x} - \mu_{h}} \right)} \right)} +} \\{\alpha \cdot {\exp \left( {{C^{- 1}\left( {\mu_{n} - \mu_{x} - \mu_{h}} \right)}/2} \right)}}\end{matrix}} \right)}C^{- 1}}}},}} & (23)\end{matrix}$

where diag(.) refers to a diagonal covariance matrix with its diagonalcomponent value being equal to the value of the vector in the argument,μ_(n) is an additive distortion mean vector, μ_(h) is a convolutivedistortion mean vector, and μ_(x) is a clean speech mean vector. Eachdivision of a vector can also be an element-wise operation.

Continuing with the example above, for a given additive distortion meanvector μ_(n) and convolutive distortion mean vector μ_(h), a value of G(.) can depend on the clean speech mean vector μ_(x). Specifically, forthe k-th Gaussian in the j-th state in the compressed model 106, theadaptor component 104 can determine an element of the G (.) matrix asfollows:

$\begin{matrix}{{G_{\alpha}\left( {j,k} \right)} = {1 - {C \cdot {{diag}\left( \frac{\begin{matrix}{{\exp \left( {C^{- 1}\left( {\mu_{n} - \mu_{x,{jk}} - \mu_{h}} \right)} \right)} +} \\{\alpha \cdot {\exp \left( {{C^{- 1}\left( {\mu_{n} - \mu_{x,{jk}} - \mu_{h}} \right)}/2} \right)}}\end{matrix}}{\begin{matrix}{1 + {\exp \left( {C^{- 1}\left( {\mu_{n} - \mu_{x,{jk}} - \mu_{h}} \right)} \right)} +} \\{2\; {\alpha \cdot {\exp \left( {{C^{- 1}\left( {\mu_{n} - \mu_{x,{jk}} - \mu_{h}} \right)}/2} \right)}}}\end{matrix}} \right)} \cdot C^{- 1}}}} & (24)\end{matrix}$

Thereafter, the adaptor component 104 can obtain Gaussian mean vectors(the k-th Gaussian in the j-th state) of the desirably adaptedcompressed model 106 by taking an expectation of both sides of Equation(20):

μ_(y,jkα)≈μ_(x,jk)+μ_(h) +g _(α)(μ_(x,jk),μ_(h),μ_(n)),   (25)

which can be applied only to a static portion of the MFCC vector.

The adaptor component 104 can further estimate a covariance matrixΣ_(y,jk,α) in the desirably adapted compressed model 106 by determininga weighted sum of the covariance matrix of the compressed model 106(prior to the parameters 110 therein being adapted (Σ_(x,jk))) and thecovariance matrix of distortion (Σ_(n)). In an example, the adaptorcomponent 104 can determine the covariance matrix Σ_(y,jk,α) by taking avariance operation on both sides of Equation (20) as follows:

Σ_(y,jk,α) =G _(α)(j, k)Σ_(x,jk) G _(α)(j, k)^(T)+(I−G _(α)(j,k)Σ_(n)(I−G _(α)(j, k))^(T).   (26)

Furthermore, the adaptor component 104 may not take into considerationconvolutive distortion variance as such variance can be treated as afixed, deterministic quantity in a given utterance. The adaptorcomponent 104 may, for delta and delta/delta portions of MFCC vectors,use the following adaption formulas for the mean vector and covariancematrix in connection with adapting the parameters of the compressedmodel 106:

μ_(Δy,jk,α) ≈G _(α)(j, k)μ_(Δx,jk)+(I−G _(α)(j, k)μ_(Δn),   (27)

μ_(ΔΔy,jk.α) ≈G _(α)(j, k)μ_(ΔΔx,jk)+(I−G _(α)(j, k))μ_(ΔΔn)))   (28)

Σ_(Δy,jk,α) ≈G _(α)(j, k)Σ_(ΔΔx,jk) G _(α)(j, k)^(T)+(I−G_(α)(j,k))Σ_(ΔΔn)(I−G _(a)(j, k))^(T)   (29)

Σ_(ΔΔy,jk,α) G _(α)(j, k)Σ_(ΔΔx,jk) G _(α)(j,k )^(T)+(I−G _(α)(j,k)Σ_(ΔΔn)(I−G _(α)(j, k))^(T)   (30)

Thus, it can be discerned that the adaptor component 104 can adaptparameters of the compressed model 106 by considering phasecorresponding to uttered speech. Furthermore, to improve performance ofspeech recognition using the compressed model 106, the adaptor component104 can perform the following actions: a) adapt parameters of thecompressed model 106 using initial estimates for the convolutivedistortion mean vector μ_(h), the additive distortion mean vector μ_(n),and the diagonal covariance matrix diag(.); b) adapt parameters of thecompressed model 106 based at least in part upon such initial estimates;c) decode the received speech utterance using the compressed model 106with the adapted parameters; d) re-estimate parameters of the compressedmodel 106 based upon the received speech utterance and the adaptedcompressed model 106; and e) adapt parameters of the compressed model106 based at least in part upon re-estimated parameters.

In an example, the adaptor component 104 can initialize the convolutivedistortion mean vector μ_(h). For instance, the adaptor component 104can initialize the convolutive distortion mean vector μ_(h) by settingeach element of the vector to zero. Furthermore, the adaptor component104 can initialize the additive distortion mean vector μ_(n) usingsample estimates from at least a first plurality of frames (speech-free)from the received distorted speech utterance. In addition, the adaptorcomponent 104 can also use sample estimates from a last plurality offrames from the received distorted speech utterance in connection withinitializing the additive distortion mean vector μ_(n). Still further,the adaptor component 104 can initialize the diagonal covariance matrixdiag(.) by using sample estimates from the first and/or last pluralityof frames from the received distorted speech utterance.

As noted above, the adaptor component 104 can use the initializedconvolutive distortion mean vector μ_(h), the additive distortion meanvector μ_(n), and the diagonal covariance matrix diag (.) to initiallyadapt parameters of the compressed model 106. The adaptor component 104can then decode the received distorted speech utterance, and canre-estimate parameters used in the adaption algorithms noted above. Inan example, the adaptor component 104 can estimate parameters pertainingto the convolution distortion mean, static and dynamic additive noisemeans, and static and dynamic additive noise variances. For instance,the adaptor component 104 can use an expectation-maximization algorithmin connection with re-estimating the aforementioned parameters using thefirst order VTS approximation.

For example, γ_(t) (j, k) can denote a posterior probability for thek-th Gaussian in the j-th state of the second model 108:

γ_(t)(j, k)=p(θ_(t) =j, ε _(t) =k|Y, λ),   (31)

where θ_(t) can denote the state index at time frame t, ε_(t) can denotethe Gaussian index at time frame t, and λ can be previous parameter setspertaining to additive and convolutive distortions. The adaptorcomponent 104 can use the following algorithms to re-estimate theconvolutive distortion mean vector μ_(h), the static and dynamicadditive distortion mean vectors μ_(n), μ_(Δn), and μ_(ΔΔn), and thestatic and dynamic additive distortion variances Σ_(n), Σ_(Δn), andΣ_(ΔΔn):

$\begin{matrix}{\mu_{h} = {\mu_{h,0} + {\begin{Bmatrix}{\sum\limits_{t}{\sum\limits_{j \in \; \Omega_{s}}{\sum\limits_{k \in \; \Omega_{m}}{\mathrm{\Upsilon}_{t}\left( {j,k} \right)}}}} \\{{G_{\alpha}\left( {j,k} \right)}^{T}\Sigma_{y,{jk}}^{- 1}{G_{\alpha}\left( {j,k} \right)}}\end{Bmatrix}^{- 1} \cdot \begin{Bmatrix}{\sum\limits_{t}{\sum\limits_{j \in \Omega_{s}}{\sum\limits_{k \in \Omega_{m}}{{\mathrm{\Upsilon}_{t}({jk})}{G_{\alpha}({jk})}^{T}}}}} \\{\Sigma_{y,{jk}}^{- 1}\begin{bmatrix}{y_{t} - \mu_{x,{jk}} - \mu_{h,0} -} \\{g_{\alpha}\left( {\mu_{x,{jk}},\mu_{h,0},\mu_{n,0}} \right)}\end{bmatrix}}\end{Bmatrix}}}} & (32) \\{\mu_{n} = {\mu_{n,0} + {\begin{Bmatrix}{\sum\limits_{t}{\sum\limits_{j \in \; \Omega_{s}}{\sum\limits_{k \in \; \Omega_{m}}{\mathrm{\Upsilon}_{t}\left( {j,k} \right)}}}} \\{\left( {I - {G_{\alpha}\left( {j,k} \right)}} \right)^{T}{\Sigma_{y,{jk},\alpha}^{- 1}\left( {I - {G_{\alpha}\left( {j,k} \right)}} \right)}}\end{Bmatrix}^{- 1} \cdot \begin{Bmatrix}{\sum\limits_{t}{\sum\limits_{j \in \Omega_{s}}{\sum\limits_{k \in \Omega_{m}}{{\mathrm{\Upsilon}_{t}({jk})}\left( {1 - {G_{\alpha}({jk})}} \right)^{T}}}}} \\{\Sigma_{{yjk}\; \alpha}^{- 1}\begin{bmatrix}{y_{t} - \mu_{x,{jk}} - \mu_{h,0} -} \\{g_{\alpha}\left( {\mu_{x,{jk}},\mu_{h,0},\mu_{n,0}} \right)}\end{bmatrix}}\end{Bmatrix}}}} & (33) \\{\mu_{\Delta \; n} = {\mu_{{\Delta \; n},0} + {\begin{Bmatrix}{\sum\limits_{t}{\sum\limits_{j \in \; \Omega_{s}}{\sum\limits_{k \in \; \Omega_{m}}{\mathrm{\Upsilon}_{t}\left( {j,k} \right)}}}} \\{\left( {I - {G_{\alpha}\left( {j,k} \right)}} \right)^{T}{\Sigma_{{\Delta \; y},{jk}}^{- 1}\left( {1 - {G_{\alpha}\left( {j,k} \right)}} \right)}}\end{Bmatrix}^{- 1} \cdot \begin{Bmatrix}{\sum\limits_{t}{\sum\limits_{j \in \Omega_{s}}{\sum\limits_{k \in \Omega_{m}}{{\mathrm{\Upsilon}_{t}({jk})}\left( {I - {G_{\alpha}({jk})}} \right)^{T}}}}} \\{\Sigma_{\Delta \; {yjk}\; \alpha}^{- 1}\left\lbrack {{\Delta \; y_{t}} - {G\; \mu_{{\Delta \; x},{jk}}} - {\left( {I - G_{\alpha}} \right)\mu_{{\Delta \; n},0}}} \right\rbrack}\end{Bmatrix}}}} & (34) \\{\mu_{{\Delta\Delta}\; n} = {\mu_{{\Delta \; n},0} + {\begin{Bmatrix}{\sum\limits_{t}{\sum\limits_{j \in \; \Omega_{s}}{\sum\limits_{k \in \; \Omega_{m}}{\mathrm{\Upsilon}_{t}\left( {j,k} \right)}}}} \\{\left( {I - {G_{\alpha}\left( {j,k} \right)}} \right)^{T}{\Sigma_{{{\Delta\Delta}\; y},{jk}}^{- 1}\left( {I - {G_{\alpha}\left( {j,k} \right)}} \right)}}\end{Bmatrix}^{- 1} \cdot \begin{Bmatrix}{\sum\limits_{t}{\sum\limits_{j \in \Omega_{s}}{\sum\limits_{k \in \Omega_{m}}{{\mathrm{\Upsilon}_{t}({jk})}\left( {I - {G_{\alpha}({jk})}} \right)^{T}}}}} \\{\Sigma_{{{\Delta\Delta}\; y},{jk}}^{- 1}\left\lbrack {{\Delta \; \Delta \; y_{t}} - {G\; \mu_{{\Delta \; \Delta \; x},{jk}}} - {\left( {I - G_{\alpha}} \right)\mu_{{{\Delta\Delta}\; n},0}}} \right\rbrack}\end{Bmatrix}}}} & (35) \\{\Sigma_{n} = {\Sigma_{n,0} - {\left( \frac{\partial^{2}Q}{\partial^{2}\Sigma_{n}} \right)_{\Sigma_{n} = \Sigma_{n,0}}^{- 1}\left( \frac{\partial Q}{\partial\Sigma_{n}} \right)_{\Sigma_{n} = \Sigma_{n,0}}}}} & (36)\end{matrix}$

The adaptor component 104 can estimate Σ_(Δn) and Σ_(ΔΔn) in a similarmanner as shown in Equation (36) by replacing static parameters withcorresponding delta and delta/delta parameters. Furthermore, while theabove equations consider phase, it is to be understood that phase neednot be taken into consideration when using such algorithms.

Referring now to FIG. 3, an example system 300 that facilitatescompressing a model to be used in a speech recognition system isillustrated. The system 300 includes a compressor component 302 thatcompresses an uncompressed model 304. The system 300 further includes adata store 306, wherein the compressor component 302 can cause acompressed model 308 to be stored in the data store 306. The compressedmodel 308 can be a compressed version of the uncompressed model 304.Furthermore, the compressor component 302 can create a codebook 310 whencompressing the uncompressed model 304 and can cause the codebook 310 tobe stored in the data store 306 (or another suitable data store). Thecodebook 310 can, for instance, be used by the adaptor component 104(FIG. 1) to expand a compressed model.

Pursuant to an example, uncompressed model 304 can be a HMM that hasbeen trained with clean speech and/or trained with “unclean” speech(e.g., speech that includes numerous distortions). Furthermore, thecompressor component 302 can use any suitable compression technique tocompress the uncompressed model 304 (e.g., to create the compressedmodel 308). In an example, the compressor component 302 can use asubspace coding (SSC) scheme. For instance, the uncompressed model 304can include K Gaussians each with fixed dimension D and with a diagonalcovariance matrix, and the compressor component 302 can performquantization along each separate dimension of the Gaussians. Thus, thecompressor component 302 can quantize dimension dof Gaussian k(m_(k)[d], v_(k) ²[d]) into a two-dimensional vector (μ_(n)[d], σ_(n)²[d]). The compressor component 302 can also use a standard k-meansclustering algorithm to design the codebook 310 with a conventionaldistortion measure for Gaussian models. The centroid of a cluster can becomputed by uniform quantization. It is to understood, however, that thecompressor component 302 can use any suitable compression techniqueand/or any suitable technique to design the codebook 310 in connectionwith compressing the uncompressed model 304 (resulting in the compressedmodel 308).

With reference now to FIG. 4, an example system 400 that facilitatesselectively adapting compressed models in a speech recognition system isillustrated. The system 400 includes the receiver component 102 that canreceive the distorted speech utterance. The adaptor component 104 canadapt a compressed model based at least in part upon the receiveddistorted speech utterance. Pursuant to an example, the system 400 canadditionally include a data store 402 which can retain multiplecompressed models 404-406. For instance, the data store 402 can includea first compressed model 404 through an Nth compressed model 406,wherein different compressed models may be trained in connection withrecognizing different words, sounds, syllables, etc. The data store 402can further include a codebook 408 that can be used by the adaptorcomponent 104 to “expand” one or more of the models when decoding thereceived speech utterance and/or adapting parameters of one or more ofthe compressed models 404-406.

The adaptor component 104 can include an expander component 410 that canuse the codebook 408 to expand one or more of the compressed models404-406 for decoding and/or adaption of parameters thereof. The adaptorcomponent 104 can additionally include a decoder component 412 that canuse one or more of the compressed models 404-406 to decode the distortedspeech utterance received by the receiver component 102.

The adaptor component 104 can use various techniques to adapt parametersof one or more of the compressed models 404-406, wherein a chosentechnique can be based at least in part upon available memory and/orprocessing resources corresponding to a speech recognition system. In afirst example, the adaptor component 104 can estimate various parameterscorresponding to the received distorted speech utterance. As notedabove, the adaptor component 104 can use sample estimates from a firstplurality of frames of the received distorted speech utterance toestimate the various parameters prior to the distorted speech utterancebeing decoded. The expander component 410 can access the codebook 408and expand the compressed models 404-406, and the adaptor component 104can adapt each of the compressed models 404-406 using the estimatedparameters. The decoder component 412 can decode the received distortedspeech utterance using the adapted compressed models, and the adaptorcomponent 104 can perform re-estimation as described above (e.g., usingEquations (32)-(36)). Subsequent to the decoder component 412 decodingthe distorted speech utterance, the adaptor component 104 can use theparameters determined by way of Equations (32)-(36) to again adaptparameters of each of the compressed models 404-406. Re-estimation andadaption may occur iteratively if desired. The received distorted speechutterance may then be decoded using a subset of the compressed models404-406.

In a second example, the adaptor component 104 can again estimatevarious parameters corresponding to the received distorted speechutterance. The expander component 410 can expand each of the compressedmodels 404-406 using the codebook 408 (or a plurality of codebooks). Theadaptor component 104 can use such estimates to initially adaptparameters of each of the compressed models 404-406. The decodercomponent 412 can decode the distorted speech utterance using a subsetof the compressed models 404-406 (with adapted parameters). Forinstance, the decoder component 412 may observe a relatively smallnumber of the compressed models 404-406 when decoding a particularspeech utterance. To reduce computational resources, the adaptorcomponent 104 can adapt parameters of models that are observed duringdecoding of the received distorted speech utterance while not adaptingother, unobserved compressed models. Still further, the adaptorcomponent 104 may adapt observed Gaussians in compressed models used bythe decoder component 412 when decoding the distorted speech utterancebut may not adapt unobserved Gaussians. The adaptor component 104, forinstance, can use the adaption algorithms (16)-(19) in connection withadapting parameters of the subset of compressed models 404-406. Thedecoder component 412 may then decode the distorted speech utteranceusing the observed, adapted compressed models 404-406. After thecompressed models 404-406 have been adapted, vector quantization (VQ)can be used to compress the models (after expansion) while retaining anoriginal mapping relation in SSC. Pursuant to an example, data-drivenVQ, uniform VQ, or other vector quantization technique can be used inconnection with compressing adapted models.

In a third example, the adaptor component 104 can adapt a subset of thecompressed models 404-406 without adapting each model in the compressedmodels 404-406 using estimates described in the first two examples. Forinstance, the decoder component 412 can decode the speech utteranceusing the compressed models 404-406, wherein the compressed models404-406 have not been subject to adaption by the adaptor component 104.A subset of the compressed models 404-406 used by the decoder component412 to decode the distorted speech utterance can be identified, and theadaptor component 104 can adapt parameters of such subset of thecompressed models 404 (e.g., using initial estimates the convolutivedistortion mean vector μ_(h), the additive distortion mean vector μ_(n),and the diagonal covariance matrix diag(.)) while not adaptingparameters corresponding to compressed models not observed duringdecoding.

Subsequent to the adaptor component 104 adapting parameters of thesubset of compressed models 404-406 as described above, the decodercomponent 412 can use such subset of compressed models 404-406 to decodethe distorted speech utterance a second time. Thereafter, the adaptorcomponent 104 can perform re-estimation as described above (e.g., usingEquations (32)-(36)), and can adapt parameters of a subset of compressedmodels 404-406 observed during the second decoding of the speechutterance. The decoder component 410 can then decode the distortedspeech utterance a third time. As with the second example, after asubset of compressed models 404-406 that have been observed during thethird decoding have been adapted, vector quantization (VQ) can be usedto compress the models while retaining an original mapping relation inSSC.

With reference now to FIGS. 5-12, various example methodologies areillustrated and described. While the methodologies are described asbeing a series of acts that are performed in a sequence, it is to beunderstood that the methodologies are not limited by the order of thesequence. For instance, some acts may occur in a different order thanwhat is described herein. In addition, an act may occur concurrentlywith another act. Furthermore, in some instances, not all acts may berequired to implement a methodology described herein.

Moreover, the acts described herein may be computer-executableinstructions that can be implemented by one or more processors and/orstored on a computer-readable medium or media. The computer-executableinstructions may include a routine, a sub-routine, programs, a thread ofexecution, and/or the like. Still further, results of acts of themethodologies may be stored in a computer-readable medium, displayed ona display device, and/or the like.

Referring now to FIG. 5, an example methodology 500 that facilitatesadapting a compressed model used in a speech recognition system (e.g.,in a mobile apparatus) is illustrated. The methodology 500 begins at502, and at 504 a distorted speech utterance is received (e.g., throughuse of one or more microphones in a mobile apparatus, such as a mobiletelephone). At 506, parameters of a compressed model in the speechrecognition system are adapted based at least in part upon the receiveddistorted speech utterance. The methodology 500 completes at 508.

With reference now to FIG. 6, an example methodology 600 for adaptingparameters of a compressed model in a speech recognition system isillustrated. The methodology 600 starts at 602, and at 604 a distortedspeech utterance is received. At 606, a first set of parameterscorresponding to a plurality of compressed models used in the speechrecognition system can be coarsely estimated. In an example, the firstset of parameters can be a convolutive distortion mean vector, anadditive distortion mean vector, and/or a diagonal covariance matrix(described above). Moreover, for instance, a convolutive distortion meanvector can be initialized (e.g., such that all elements of theconvolutive distortion mean vector are zero), an additive distortionmean vector can be initialized using sample estimates from a pluralityof speech-free frames of the received distorted speech utterance.Similarly, a diagonal covariance matrix diag (.) can be initialized byusing sample estimates from a plurality of frames (speech free) from thereceived distorted speech utterance.

At 608, each of the plurality of compressed models in the speechrecognition system is adapted (e.g., parameters thereof are adapted)based at least in part upon the first set of parameters estimated at act606. For example, Equations (13)-(19) can be used in connection withadapting each of the plurality of compressed models in the speechrecognition system.

At 610, the distorted speech utterance received at 604 can be decoded,wherein a subset of the plurality of compressed models are observed whendecoding the speech utterance. For instance, not all compressed modelsin a speech recognition system will be used during decoding.

At 612, a second set of parameters corresponding to the compressedmodels are estimated based at least in part upon the decoding act of610. For instance, the second set of parameters can include aconvolutive distortion mean vector, static and dynamic additivedistortion mean vectors, and static and dynamic additive distortionvariances. Furthermore, Equations (31)-(36) can be used in connectionwith estimating the aforementioned second set of parameters.

Referring now to FIG. 7, the methodology 600 continues at 614, whereeach of the compressed models in the speech recognition system areadapted a second time based at least in part upon the second set ofparameters estimated at 612. For instance, Equations (13)-(19) can beused in connection with adapting each of the plurality of compressedmodels in the speech recognition system. At 616, the distorted speechutterance is decoded using the models adapted at 614. The methodology600 completes at 618.

Now referring to FIG. 8, an example methodology 800 for adaptingparameters of compressed models in a speech recognition system isillustrated. The methodology 800 starts at 802, and at 804 a distortedspeech utterance is received (e.g., through use of microphones in amobile device). At 806, a first set of parameters corresponding to aplurality of compressed models used in the speech recognition system canbe coarsely estimated. As noted above, the first set of parameters caninclude a convolutive distortion mean vector, an additive distortionmean vector, and/or a diagonal covariance matrix (described above).

At 808, each of the plurality of compressed models in the speechrecognition system is adapted (e.g., parameters thereof are adapted)based at least in part upon the first set of parameters estimated at act806. For example, Equations (13)-(19) can be used in connection withadapting each of the plurality of compressed models in the speechrecognition system.

At 810, the distorted speech utterance received at 604 can be decoded,wherein a subset of the plurality of compressed models are observed whendecoding the speech utterance. For instance, as noted above, not allcompressed models in the speech recognition system will be used duringdecoding.

At 812, a second set of parameters corresponding to the compressedmodels are estimated based at least in part upon the decoding act of810. For instance, the second set of parameters can include aconvolutive distortion mean vector, static and dynamic additivedistortion mean vectors, and static and dynamic additive distortionvariances. Furthermore, Equations (31)-(36) can be used in connectionwith estimating the aforementioned second set of parameters.

Now turning to FIG. 9, the methodology 800 continues at 814, wherein asubset of the compressed models are adapted based at least in part uponthe second set of parameters estimated at 812. For instance, the subsetof the compressed models may be compressed models used during thedecoding act of 810 (and not other compressed models that were not usedduring the act of decoding). Furthermore, Equations (13)-(19) can beused in connection with adapting the subset of compressed models.

At 816, the distorted speech utterance received at 804 can be decodedusing at least one of the compressed models adapted at 814. Themethodology 800 ends at 818.

Now referring to FIG. 10, an example methodology 1000 for adapting atleast one compressed model used in a speech recognition system isillustrated. The methodology 1000 starts at 1002, and at 1004 adistorted speech utterance is received. At 1006, the distorted speechutterance is decoded using a first subset of a plurality of models inthe speech recognition system. For instance, the speech recognitionsystem can include a plurality of models that can be used in connectionwith decoding speech utterances, and for each received speech utterancea relatively small number of models may be observed during decoding.

At 1008, a first set of parameters corresponding to the first subset ofthe plurality of compressed models can be coarsely estimated. As notedabove, the first set of parameters can include a convolutive distortionmean vector, an additive distortion mean vector, and/or a diagonalcovariance matrix (described above).

At 1010, the first subset of the plurality of compressed models (e.g.,compressed models observed during the decoding act of 1006) in thespeech recognition system can be adapted (e.g., parameters thereof areadapted) based at least in part upon the first set of parametersestimated at act 1008. For example, Equations (13)-(19) can be used inconnection with adapting the subset of compressed models in the speechrecognition system.

At 1012, the received distorted speech utterance is decoded a secondtime using the plurality of compressed models in the speech recognitionsystem. A second subset of the plurality of compressed models can beobserved during the act of decoding, wherein the second subset mayinclude identical models in the first subset or may include at least onedifferent model when compared to models in the first subset ofcompressed models.

Now referring to FIG. 11, the methodology 1000 continues, and at 1014 asecond set of parameters corresponding to the second subset ofcompressed models is estimated based at least in part upon the decodingof the speech utterance at 1012. For instance, the second set ofparameters can include a convolutive distortion mean vector, static anddynamic additive distortion mean vectors, and static and dynamicadditive distortion variances. Furthermore, Equations (31)-(36) orsimilar algorithms can be used in connection with estimating theaforementioned second set of parameters.

At 1016, the second subset of compressed models is adapted based atleast in part upon the second set of parameters estimated at 1014. Forinstance, the second subset of compressed models can be adapted throughuse of Equations (13)-(19) or similar algorithms.

At 1018, the received distorted speech utterance is decoded using theplurality of compressed models, wherein a third subset of the pluralityof compressed models can be observed during the act of decoding. Forinstance, the third subset of compressed models can be substantiallysimilar to the first subset of compressed models and/or the secondsubset of compressed models. In another example, the third subset ofcompressed models can be non-similar to the first subset of thecompressed models and/or the second subset of compressed models. Themethodology 1000 completes at 1020.

Referring now to FIG. 12, an example methodology 1200 for adaptingparameters in a compressed model is illustrated. For instance, themethodology 1200 can be encoded in a computer-readable medium and can beaccessible by a processor. Further, the computer-readable medium caninclude a speech recognition system that comprises a plurality ofcompressed models.

The methodology 1200 starts at 1202, and at 1204 a distorted speechutterance is received. At 1206, an additive distortion mean vectorcorresponding to the distorted speech utterance is coarsely estimatedbased at least in part upon a portion of the received distorted speechutterance.

At 1208, parameters of a first subset of compressed models in theplurality of compressed models are adapted based at least in part uponthe coarsely estimated additive distortion mean vector.

At 1210, the received distorted speech utterance is decoded using asecond subset of compressed models in the plurality of compressedmodels.

At 1212, the additive distortion mean vector is re-estimated based atleast in part upon the decoded speech utterance.

At 1214, parameters of the second subset of compressed models in theplurality of compressed models are adapted based at least in part uponthe re-estimated additive distortion mean vector.

At 1216, the received distorted speech utterance is decoded using atleast one compressed model in the second subset of compressed models.The methodology completes at 1218.

While some of the methodologies above describe adapting models, it is tobe understood that the methodologies can be adapted to adapt parametersin one or more models. For instance, when a speech utterance is decodedthrough use of a HMM, only a portion of Gaussians in the HMM may beobserved. Thus, the methodologies above can be adapted such that subsetsof Gaussians in a HMM are adapted (rather than all Gaussians in theHMM). Moreover, the methodologies can be adapted such that only aportion of a model observed during decoding is adapted (e.g., to savecomputing and/or memory resources).

Now referring to FIG. 13, a high-level illustration of an examplecomputing device 1300 that can be used in accordance with the systemsand methodologies disclosed herein is illustrated. For instance, thecomputing device 1300 may be used in a system that supports speechrecognition. In another example, at least a portion of the computingdevice 1300 may be used in a system that supports speech recognition ina mobile apparatus. The computing device 1300 includes at least oneprocessor 1302 that executes instructions that are stored in a memory1304. The instructions may be, for instance, instructions forimplementing functionality described as being carried out by one or morecomponents discussed above or instructions for implementing one or moreof the methods described above. The processor 1302 may access the memory1304 by way of a system bus 1306. In addition to storing executableinstructions, the memory 1304 may also store one or more compressedmodels (such as Hidden Markov Models), adaption algorithms,re-estimation algorithms, and the like.

The computing device 1300 additionally includes a data store 1308 thatis accessible by the processor 1302 by way of the system bus 1306. Thedata store 1308 may include executable instructions, compressed models,etc. The computing device 1300 also includes an input interface 1310that allows external devices to communicate with the computing device1300. For instance, the input interface 1310 may be used to receiveinstructions from an external computer device, speech utterances from anindividual, etc. The computing device 1300 also includes an outputinterface 1312 that interfaces the computing device 1300 with one ormore external devices. For example, the computing device 1300 maydisplay text, images, etc. by way of the output interface 1312.

Additionally, while illustrated as a single system, it is to beunderstood that the computing device 1300 may be a distributed system.Thus, for instance, several devices may be in communication by way of anetwork connection and may collectively perform tasks described as beingperformed by the computing device 1300.

As used herein, the terms “component” and “system” are intended toencompass hardware, software, or a combination of hardware and software.Thus, for example, a system or component may be a process, a processexecuting on a processor, or a processor. Additionally, a component orsystem may be localized on a single device or distributed across severaldevices.

It is noted that several examples have been provided for purposes ofexplanation. These examples are not to be construed as limiting thehereto-appended claims. Additionally, it may be recognized that theexamples provided herein may be permutated while still falling under thescope of the claims.

1. A speech recognition system comprising the followingcomputer-executable components: a receiver component that receives adistorted speech utterance; and an adaptor component in communicationwith the receiver component, wherein the adaptor component selectivelyadapts parameters of a compressed model used to recognize at least aportion of the distorted speech utterance, wherein the adaptor componentselectively adapts the parameters of the compressed model based at leastin part upon the received distorted speech utterance.
 2. The speechrecognition system of claim 1, wherein the compressed model is acompressed Hidden Markov Model.
 3. The speech recognition system ofclaim 1, wherein the adaptor component is configured to jointlycompensate for additive and convolutive distortions in the receivedspeech utterance when adapting the parameters of the compressed model.4. The speech recognition system of claim 1, further comprising acompressor component that receives an uncompressed model, compresses theuncompressed model, and outputs the compressed model.
 5. The speechrecognition system of claim 4, wherein the compressor component usessubspace coding to compress the uncompressed model.
 6. The speechrecognition system of claim 1, further comprising an expander componentthat expands the compressed model and wherein the adaptor componentadapts parameters of the model based at least in part upon the receiveddistorted speech utterance.
 7. The speech recognition system of claim 1,further comprising: a plurality of compressed models usable inconnection with speech recognition; an expander component that expandsthe plurality of compressed models, wherein the adaptor component adaptsparameters of each of the models based at least in part upon thereceived distorted speech utterance; and a decoder component thatdecodes the received distorted speech utterance using adapted parametersof at least one of the plurality of models, wherein after the decodercomponent has decoded the received speech utterance the adaptorcomponent adapts only parameters of models observed by the decodercomponent when decoding the received distorted speech utterance.
 8. Thespeech recognition system of claim 1, further comprising: a plurality ofcompressed models usable in connection with speech recognition; anexpander component that expands the plurality of compressed models; adecoder component that uses at least one of the plurality of compressedmodels to decode the received distorted speech utterance, wherein theadaptor component adapts parameters of the at least one of the pluralityof compressed models used by the decoder component to decode thereceived speech utterance.
 9. The speech recognition system of claim 8,wherein subsequent to the adaptor component adapting the parameters ofthe at least one of the plurality of models used by the decodercomponent when decoding the received distorted speech utterance, thedecoder component decodes the received distorted speech utterance usingthe at least one adapted model.
 10. The speech recognition system ofclaim 9, wherein subsequent to the decoder component decoding thereceived distorted speech utterance using the at least one adaptedmodel, the adaptor component adapts the at least one model from theplurality of models a second time, and wherein the decoder componentuses the twice adapted at least one model to decode the receiveddistorted speech utterance.
 11. The speech recognition system of claim1, wherein a mobile device comprises the receiver component and theadaptor component.
 12. A method comprising the followingcomputer-executable acts: receiving a distorted speech utterance at amobile device that includes a speech recognition system; and adaptingparameters of a compressed model in the speech recognition system basedat least in part upon the distorted speech utterance.
 13. The method ofclaim 12, wherein the distorted speech utterance includes additivedistortions and convolutive distortions.
 14. The method of claim 12,wherein the compressed model is a compressed Hidden Markov Model. 15.The method of claim 12, wherein the speech recognition comprises aplurality of compressed models, and further comprising: coarselyestimating at least one parameter corresponding to the distorted speechutterance; and adapting parameters of a first subset of the plurality ofcompressed models based at least in part upon the at least one coarselyestimated parameter.
 16. The method of claim 15, further comprising:decoding the received speech utterance using the speech recognitionsystem, wherein a second subset of compressed models of the plurality ofcompressed models is observed during decoding; and re-estimating the atleast one parameter corresponding to the distorted speech utterancebased at least in part upon the decoded speech utterance.
 17. The methodof claim 16, further comprising: adapting parameters of the secondsubset of compressed models in the plurality of compressed models basedat least in part upon the re-estimated at least one parametercorresponding to the distorted speech utterance.
 18. The method of claim17, further comprising decoding the received speech utterance using thespeech recognition system.
 19. The method of claim 16, wherein the firstsubset and the second subset of compressed models comprise substantiallysimilar compressed models.
 20. A computer-readable medium residing in amobile device that includes a speech recognition system comprising aplurality of compressed models, wherein the computer-readable mediumcomprises instructions that, when executed by a processor, perform thefollowing acts: receive a distorted speech utterance; coarsely estimatean additive distortion mean vector corresponding to the distorted speechutterance based at least in part upon a portion of the receiveddistorted speech utterance; adapt parameters of a first subset ofcompressed models in the plurality of compressed models based at leastin part upon the coarsely estimated additive distortion mean vector;decode the received distorted speech utterance using a second subset ofcompressed models in the plurality of compressed models; re-estimate theadditive distortion mean vector based at least in part upon the decodedspeech utterance; adapt parameters of the second subset of compressedmodels in the plurality of compressed models based at least in part uponthe re-estimated additive distortion mean vector; and decode thereceived distorted speech utterance using at least one compressed modelin the second subset of compressed models.